80 research outputs found

    "Is a picture of a bird a bird": Policy recommendations for dealing with ambiguity in machine vision models

    Full text link
    Many questions that we ask about the world do not have a single clear answer, yet typical human annotation set-ups in machine learning assume there must be a single ground truth label for all examples in every task. The divergence between reality and practice is stark, especially in cases with inherent ambiguity and where the range of different subjective judgments is wide. Here, we examine the implications of subjective human judgments in the behavioral task of labeling images used to train machine vision models. We identify three primary sources of ambiguity arising from (i) depictions of labels in the images, (ii) raters' backgrounds, and (iii) the task definition. On the basis of the empirical results, we suggest best practices for handling label ambiguity in machine learning datasets

    Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs

    Full text link
    Large language models (LLMs) have achieved widespread success on a variety of in-context few-shot tasks, but this success is typically evaluated via correctness rather than consistency. We argue that self-consistency is an important criteria for valid multi-step reasoning in tasks where the solution is composed of the answers to multiple sub-steps. We propose two types of self-consistency that are particularly important for multi-step reasoning -- hypothetical consistency (a model's ability to predict what its output would be in a hypothetical other context) and compositional consistency (consistency of a model's final outputs when intermediate sub-steps are replaced with the model's outputs for those steps). We demonstrate that multiple variants of the GPT-3/-4 models exhibit poor consistency rates across both types of consistency on a variety of tasks.Comment: Added GPT-4 result

    BLiMP: The Benchmark of Linguistic Minimal Pairs for English

    Full text link
    We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4%. We use it to evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.Comment: To appear in TAC

    Evolutionary Reconstructions of the Transferrin Receptor of Caniforms Supports Canine Parvovirus Being a Re-emerged and Not a Novel Pathogen in Dogs

    Get PDF
    Parvoviruses exploit transferrin receptor type-1 (TfR) for cellular entry in carnivores, and specific interactions are key to control of host range. We show that several key mutations acquired by TfR during the evolution of Caniforms (dogs and related species) modified the interactions with parvovirus capsids by reducing the level of binding. These data, along with signatures of positive selection in the TFRC gene, are consistent with an evolutionary arms race between the TfR of the Caniform clade and parvoviruses. As well as the modifications of amino acid sequence which modify binding, we found that a glycosylation site mutation in the TfR of dogs which provided resistance to the carnivore parvoviruses which were in circulation prior to about 1975 predates the speciation of coyotes and dogs. Because the closely-related black-backed jackal has a TfR similar to their common ancestor and lacks the glycosylation site, reconstructing this mutation into the jackal TfR shows the potency of that site in blocking binding and infection and explains the resistance of dogs until recent times. This alters our understanding of this well-known example of viral emergence by indicating that canine parvovirus emergence likely resulted from the re-adaptation of a parvovirus to the resistant receptor of a former host

    DataPerf: Benchmarks for Data-Centric AI Development

    Full text link
    Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.Comment: NeurIPS 2023 Datasets and Benchmarks Trac

    Theory and description in African Linguistics: Selected papers from the 47th Annual Conference on African Linguistics

    Get PDF
    The papers in this volume were presented at the 47th Annual Conference on African Linguistics at UC Berkeley in 2016. The papers offer new descriptions of African languages and propose novel theoretical analyses of them. The contributions span topics in phonetics, phonology, syntax, semantics, and pragmatics and reflect the typological and genetic diversity of languages in Africa. Four papers in the volume examine Areal Features and Linguistic Reconstruction in Africa, and were presented at a special workshop on this topic held alongside the general session of ACAL

    Theory and description in African Linguistics: Selected papers from the 47th Annual Conference on African Linguistics

    Get PDF
    The papers in this volume were presented at the 47th Annual Conference on African Linguistics at UC Berkeley in 2016. The papers offer new descriptions of African languages and propose novel theoretical analyses of them. The contributions span topics in phonetics, phonology, syntax, semantics, and pragmatics and reflect the typological and genetic diversity of languages in Africa. Four papers in the volume examine Areal Features and Linguistic Reconstruction in Africa, and were presented at a special workshop on this topic held alongside the general session of ACAL
    • …
    corecore